590 research outputs found

    On Local Regret

    Full text link
    Online learning aims to perform nearly as well as the best hypothesis in hindsight. For some hypothesis classes, though, even finding the best hypothesis offline is challenging. In such offline cases, local search techniques are often employed and only local optimality guaranteed. For online decision-making with such hypothesis classes, we introduce local regret, a generalization of regret that aims to perform nearly as well as only nearby hypotheses. We then present a general algorithm to minimize local regret with arbitrary locality graphs. We also show how the graph structure can be exploited to drastically speed learning. These algorithms are then demonstrated on a diverse set of online problems: online disjunct learning, online Max-SAT, and online decision tree learning.Comment: This is the longer version of the same-titled paper appearing in the Proceedings of the Twenty-Ninth International Conference on Machine Learning (ICML), 201

    Solving Imperfect Information Games Using Decomposition

    Full text link
    Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.Comment: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations and theor

    Solving Large Extensive-Form Games with Strategy Constraints

    Full text link
    Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.Comment: Appeared in AAAI 201

    Count-Based Exploration with the Successor Representation

    Full text link
    In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically sample-efficient approaches. Finally, we extend these ideas to a deep RL algorithm and show that it achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime.Comment: This paper appears in the Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020

    The Case Against Employment Tester Standing Under Title VII and 42 U.S.C. § 1981

    Get PDF
    In 1964, Congress passed comprehensive legislation aimed at eradicating discrimination in employment, public accommodations, public facilities, public schools, and federal benefit programs. Title VII of this Act directed its aim specifically at stamping out prejudice in employment. Four years later, the Supreme Court resurrected the provisions of § 1 of the Civil Rights Act of 1866, which, among other things, protects citizens, regardless of race or color, in their right to make and enforce [employment] contracts. Together, Title VII and § 1981 serve as the primary legal bases for challenging racially discriminatory actioris by private employers. More than thirty years after the passage of Title VII and the Court\u27s resurrection of § 1981, though, society continues to feel the lingering effects of America\u27s history of slavery and segregation in the field of employment. A study by the Urban Institute in the late 1980s and early 1990s determined that black job applicants continued to face discriminatory treatment at all levels of the hiring process. In view of the continuing effects of discrimination in employment, a number of civil rights organizations around the country have employed testing as a means of ferreting out discrimination in the hiring process
    • …
    corecore